library(dplyr)
library(purrr)
library(readxl)
library(stringr)
library(janitor)
library(tidyverse)
library(rvest)
library(gridExtra)
library(ggrepel)
library(directlabels)
library(DT)
read_clean <- function(..., sheet){
read_excel(..., sheet = sheet)
}
options(scipen = 999999)
reorderFactors <- function(df, column = "my_column_name",
desired_level_order = c("fac1", "fac2", "fac3")) {
x = df[[column]]
lvls_src = levels(x)
idxs_target <- vector(mode="numeric", length=0)
for (target in desired_level_order) {
idxs_target <- c(idxs_target, which(lvls_src == target))
}
x_new <- factor(x,levels(x)[idxs_target])
df[[column]] <- x_new
return(df)
}Trend Analysis of Road Traffic Accidents of Turkey
Traffic Accident Death and Injury Data
This document is a product of the final project for the STAT 570 lecture, focusing on data handling and visualization tools. It is essential to acknowledge that minor errors may be present, and the methods employed may not necessarily reflect the optimal approach related to the data set.
Levent Sarı - levent.sari@metu.edu.tr
Hüseyin Tan - huseyin.tan@metu.edu.tr
Introduction
In today’s world, traffic accidents emerge as a serious public health issue globally. Every year, thousands of lives are lost, and tens of thousands of individuals are injured. This situation adversely affects not only individuals but also the overall well-being of society. The increasing frequency of traffic accidents once again emphasizes the importance of a safe transportation environment.
Traffic accidents continue to be a pressing issue worldwide, posing significant threats to public safety, economic stability, and overall well-being. In Turkey, a country with a dynamic transportation landscape marked by rapid urbanization and increased vehicular traffic, understanding and addressing the factors contributing to traffic accidents is of paramount importance.
With the rapidly growing use of transportation vehicles and roads, the causes and effects of accidents have become more complex. Factors such as driver errors, infrastructure deficiencies, weather conditions, and traffic congestion increase the likelihood of accidents. This underscores the need for more efforts in developing safe transportation systems and addressing existing issues effectively.
In this study, three different data sets were used and our goal is to examine the number of deaths and injuries in traffic accidents in Turkey between 2002 and 2022 according to age ranges. Also, We will share with you estimated number of road traffic death rate around the world and the current position of Turkey.
Our goal is :
Convert untidy data sets to tidy data sets.
Creating a different data set from the edited data sets.
Explaining data with graphics and tables.
Literature Review
Before conducting our analysis, we decided to examine previous studies about Turkey’s road traffic accidents, injuries and deaths. First and foremost, a study by Kaygisiz Et Al., (2017) considers a road traffic accident (RTA) to be the ones that include people or vehicles, and that happen on road. Earlier studies (Naci & Baker, 2008) reported that Turkey needed to focus on collecting data in an organized manner to understand various reasons related to the accidents, also suggesting that it could be a key element in reducing losses from them (Esiyok Et Al., 2005). Recent research (Erenler & Gumus, 2019) mentions that RTAs are one of the ten primary reasons of mortality worldwide, and even higher in developing countries. The study also provides the information that for persons between the ages 15 and 29, RTAs are the highest ranked reason for death. Many elements contribute to the causes of accidents, one of them being economic growth. While the study by Puvanachandra Et Al. (2012) claims that regions with lower income have higher RTA fatalities, the study of Erenler & Gumus (2019) suggests that developing countries experience more severe fatalities from RTAs. When the causes are investigated on an accident basis, another study (Sungur Et Al., 2014) that brings an epidemiological lookout on the case reports that less than 1% of the accidents are caused by non-human factors, while 95% of the time, the responsible party is the driver. It is also said that DUIs and exhaustion are of the most common reasons. On the economic impacts of RTAs, Naci & Baker (2008) suggest that in 2000, the deaths occurring on RTAs have impacted Turkey’s economy with a negative $2.6 Billion only by hindering productivity. Also, in the recent study of Ozturk (2022), child fatalities are also considered to have a considerable contribution to Turkey’s health burden. Thus, our study aims to use three data sets obtained from TURKSTAT and World Health Organization (WHO) to observe trends and provide comparisons of Turkey to other countries.
Data sets
Three different data sets were used in this project. Two of these data sets were taken from TURKSTAT and one from the WHO website.
First TURKSTAT Data Set contains following columns:
Accidents involving death and personal injury
Accidents involving material loss only
Total number of accidents
Year
Second TURKSTAT Data Set contains following columns:
Killed Persons
Injured Persons
Age Groups
Year
WHO Data Set contains following columns:
Countries
Estimated number of road traffic death rate
Year
TURKSTAT raw Data Sets:
You can view screenshots of the data sets in their original format below.
First Data Set
Second Data Set
TURKSTAT Data set Problems:
The spreadsheets starts and ends with a some text.
Column names are written separately in both English and Turkish.
Some columns are left blank for visual purposes.
The Age group data divided into two group in the same sheet.
Data is not in the long format.
WHO data set is tidy and available for study.
Data Collection and Pre processing
First, we will load all the libraries and define functions that will be needed throughout the study.
Here, the scipen = 999999 option removes scientific notation of numbers and lets us create graphs with better axis break labels. The user-defined reorderFactors function is useful in ordering factors to our desired axis layout while plotting, as base R functions such as order sometimes fail to work within the ggplot2 library. To do this, the function stores the factor levels and orders them based on the desired index input.
After loading required libraries, we will download our first data from TURKSTAT website.
url = 'https://data.tuik.gov.tr/Bulten/DownloadIstatistikselTablo?p=8RC9RpGXOVWg1rPaE6MQ4FUE37S8S2vsiIJglnqCOrpJfrRCUPa5n3wsXEnCI0Xf'
raw_data = tempfile(fileext = ".xls")
download.file(url, raw_data,
method = "auto",
mode = "wb")
sheets <- excel_sheets(raw_data)
read_clean <- function(..., sheet){
read_excel(..., sheet = sheet)
}
raw_data <- map(
sheets,
~read_clean(raw_data,
skip = 2,
sheet = .)
) |>
bind_rows()
head(raw_data,10)# A tibble: 10 × 8
...1 `Toplam kaza` `Maddi hasarlı` `Ölümlü, yaralanmalı` `Ölü sayısı (1)`
<chr> <chr> <chr> <chr> <chr>
1 <NA> sayısı kaza sayısı kaza sayısı Killed persons …
2 Yıl Total number Accidents involvi… Accidents involving … Toplam
3 Year of accidents material loss only and personal injury Total
4 2002 439777 374029 65748 4093
5 2003 455637 388606 67031 3946
6 2004 537352 460344 77008 4427
7 2005 620789 533516 87273 4505
8 2006 728755 632627 96128 4633
9 2007 825561 718567 106994 5007
10 2008 950120 845908 104212 4236
# ℹ 3 more variables: ...6 <chr>, ...7 <chr>, Yaralı <chr>
The data from TURKSTAT traditionally comes including the data name and translation in its first two rows. Thus, we are skipping those rows using skip = 2 in the read_clean function. Afterwards, taking a glimpse into the data, we can see that the column names and some rows need further processing.
data_1 = raw_data %>% select(1:4) %>% slice(-1)
data_1 = rbind(data_1, paste(data_1[1,],data_1[2,]))
data_2 = data_1 %>% slice(3:23,30)
data_2 = data_2 %>% row_to_names(22,remove_rows_above = FALSE)
colnames(data_2)[1] = 'Year'
data_accidents = data_2; rm(data_1); rm(data_2); rm(raw_data)
data_accidents = data_accidents %>%
mutate(across(where(is.character),as.numeric))Since the second data will have common information with the first data, we decided to keep only the unique four columns of it with select(1:4), also since the first row was corrupted, it is removed using the slice(-1) command. Then, we combined the first and second rows of the new data to create proper column names. Lastly, we manually removed the Turkish translation of Year column, selected the rows that had data in them and fixed the column classes of the data before moving on to the second TURKSTAT data set.
url = 'https://data.tuik.gov.tr/Bulten/DownloadIstatistikselTablo?p=2/Sym42hc5kOF437mqcxligj8l5uHDGvvOSKXfSdmBmVHwkyus9lGDyc36ojWVBg'
raw_data = tempfile(fileext = ".xls")
download.file(url, raw_data,
method = "auto",
mode = "wb")
sheets <- excel_sheets(raw_data)
raw_data <- map(
sheets,
~read_clean(raw_data,
skip = 2,
sheet = .)
) |>
bind_rows()
head(raw_data,30)# A tibble: 30 × 12
...1 Yaş grupları - Age gr…¹ ...3 ...4 ...5 ...6 ...7 ...8 ...9 ...10
<chr> <chr> <chr> <lgl> <chr> <chr> <lgl> <chr> <chr> <lgl>
1 <NA> 0 - 9 <NA> NA 10 -… <NA> NA 15 -… <NA> NA
2 Yıl Ölü sayısı (1) Yara… NA Ölü … Yara… NA Ölü … Yara… NA
3 Year Killed persons (1) Inju… NA Kill… Inju… NA Kill… Inju… NA
4 2002 322 8788 NA 84 4524 NA 68 4572 NA
5 2003 178 7149 NA 68 3992 NA 67 4533 NA
6 2004 225 8148 NA 80 4642 NA 66 5152 NA
7 2005 179 9077 NA 108 5988 NA 58 6095 NA
8 2006 178 9237 NA 89 6133 NA 74 6673 NA
9 2007 179 10333 NA 89 6790 NA 95 7337 NA
10 2008 151 9486 NA 80 6689 NA 65 6930 NA
# ℹ 20 more rows
# ℹ abbreviated name: ¹`Yaş grupları - Age groups`
# ℹ 2 more variables: ...11 <chr>, ...12 <chr>
In the second TURKSTAT data, the data reading process is repeated. When the first 30 rows of the data is observed, it can be seen that the data is split into two and merged vertically. To solve this problem, We split the data into two considering the last year ‘2022’ to be the last row for each part. Also, we removed that the data contained all NA columns for styling purposes in Excel that were unnecessary.
data_1 = raw_data %>%
slice(1:49) %>%
filter(row_number() <= min(which(...1 == 2022))) %>%
select(where(~!all(is.na(.)))) %>%
slice(-2)
head(data_1)# A tibble: 6 × 9
...1 `Yaş grupları - Age groups` ...3 ...5 ...6 ...8 ...9 ...11 ...12
<chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
1 <NA> 0 - 9 <NA> 10 -… <NA> 15 -… <NA> 18 -… <NA>
2 Year Killed persons (1) Injured… Kill… Inju… Kill… Inju… Kill… Inju…
3 2002 322 8788 84 4524 68 4572 129 7143
4 2003 178 7149 68 3992 67 4533 120 6880
5 2004 225 8148 80 4642 66 5152 122 7984
6 2005 179 9077 108 5988 58 6095 140 8990
Looking at the head of the data, we saw that the age groups are only labeled on the first columns they show up in. To fix that issue before merging the first rows and naming columns according to them, in a similar manner to the first TURKSTAT data, we filled NA values with last non NA value before them in that row.
for (i in 2:ncol(data_1)) {
if (is.na(data_1[1, i])) {
if (!is.na(data_1[1, i - 1])) {
data_1[1, i] = data_1[1, i - 1]
}
}
}
data_1 = rbind(paste(data_1[1,],data_1[2,]),data_1)
data_1 = data_1 %>%
slice(-2,-3) %>%
row_to_names(1,remove_rows_above = F)
colnames(data_1)[1] = 'Year'Since the data is in wide format, we used regex patterns and pivot_longer function to transform it into long format.
data_transformed_1 = data_1 %>%
pivot_longer(cols = -Year,
names_to = c("Age_Group", "Killed_Or_Injured"),
names_pattern = "(\\d+\\s-\\s\\d+)\\s(\\w+)\\spersons") %>%
mutate(Age_Group = gsub("-", "_", Age_Group)) %>%
filter(!is.na(value))The same process is repeated for the second part of the data. However, since age groups ‘65+’ and ‘Unknown’ do not fit the format of our selection regex, we decided to manually input the NA returns using loops for those values.
data_2 = raw_data %>%
slice(1:49) %>%
filter(row_number() > min(which(...1 == 2022))) %>%
select(where(~!all(is.na(.)))) %>%
slice(-1,-3)
for (i in 2:ncol(data_2)) {
if (is.na(data_2[1, i])) {
if (!is.na(data_2[1, i - 1])) {
data_2[1, i] = data_2[1, i - 1]
}
}
}
data_2 = rbind(paste(data_2[1,],data_2[2,]),data_2)
data_2 = data_2 %>%
slice(-2,-3) %>%
row_to_names(1,remove_rows_above = F)
colnames(data_2)[1] = 'Year'
data_transformed_2 = data_2 %>%
pivot_longer(cols = -Year,
names_to = c("Age_Group", "Killed_Or_Injured"),
names_pattern = "(\\d+\\s-\\s\\d+)\\s(\\w+)\\spersons") %>%
mutate(Age_Group = gsub("-", "_", Age_Group)) %>%
filter(!is.na(value))
for(i in 1:nrow(data_transformed_2)){
if(is.na(data_transformed_2[i,"Age_Group"])){
if(i%%2==1){
data_transformed_2[i,"Age_Group"] = '65 +'
data_transformed_2[i,"Killed_Or_Injured"] = 'Killed'
}
if(i%%2==0){
data_transformed_2[i,"Age_Group"] = 'Unknown'
data_transformed_2[i,"Killed_Or_Injured"] = 'Injured'
}
}
}The two parts of the data are then merged.
data_fin = rbind(data_transformed_1,data_transformed_2)
data_fin = data_fin %>%
mutate(Year=as.numeric(Year),
value = as.numeric(value))
data_accidents_full = data_fin %>%
left_join(data_accidents,by = 'Year')
rm(data_1,data_2,data_accidents,data_fin,data_transformed_1,data_transformed_2,raw_data)Lastly, to observe Turkey’s position in accidents and death rates among European countries, the third data set from WHO website is downloaded and read into R.
#The link for the following data: https://www.who.int/data/gho/data/indicators/indicator-details/GHO/estimated-road-traffic-death-rate-(per-100-000-population)
who_data = read.csv('data.csv')
who_data_filtered = who_data %>%
select(Location,Period,Value) %>%
separate(Value, into = c('Value'), sep = ' '); rm(who_data)
who_data_filtered = who_data_filtered %>%
mutate(Period = as.factor(Period),
Value = as.numeric(Value))
head(who_data_filtered) Location Period Value
1 Antigua and Barbuda 2019 0.00
2 Micronesia (Federated States of) 2019 0.16
3 Maldives 2019 1.63
4 Kiribati 2019 1.92
5 Egypt 2019 10.10
6 Ukraine 2019 10.20
As seen in the code, no pre-processing apart from selecting useful columns and fixing their classes is needed for the WHO data as it is already tidy.
Data Visualization and Conclusions
In exploratory analysis, with the help of tidyverse and datatable packages, various pivot tables as well as plots can be created.
- Yearly Number of Accidents in Turkey
To view the table of yearly accidents in Turkey, as well as the summary statistics, we will first create two data tables. For the first frequency table, simply selecting the desired columns and piping them into a datatable() function is satisfactory.
data_accidents_full %>%
select(Year,`Total number of accidents`,`Accidents involving material loss only`,`Accidents involving death and personal injury`) %>%
unique() %>%
datatable(class = "compact",
caption = 'Yearly Number of Accidents\nin Turkey',
options = list(pageLength = nrow(.)))For the summary statistics table, we will first select the columns we want to work with, calculate their summary statistics, convert the output into a data frame and then place the data frame into the datatable() function.
data_accidents_full %>%
select(`Total number of accidents`,`Accidents involving material loss only`,`Accidents involving death and personal injury`) %>%
summary() %>%
as.data.frame() %>%
select(Var2, Freq) %>%
datatable(class = 'compact',
options = list(pageLength = nrow(.)),
colnames = c('Type','Sum. Stat.'),
caption = 'Summary Statistics of Accidents in Turkey between 2002-2022')We will also visualize the data to make it more accessible.
data_accidents_full %>%
select(Year,`Total number of accidents`,`Accidents involving material loss only`,`Accidents involving death and personal injury`) %>%
group_by(Year) %>%
mutate(Year = as.factor(Year)) %>%
unique() %>%
gather('Accident','Count',-Year) %>%
filter(!Accident=='Total number of accidents') %>%
ggplot(.,aes(x = Year,y = Count, fill = Accident, group = Accident)) +
geom_bar(stat = 'identity') +
geom_text(aes(label = format(after_stat(y), big.mark = ".", scientific = FALSE), group = Accident), fontface = 'bold',
stat = 'summary', fun = sum, vjust = 0.5,hjust = 1.4, position = position_stack()) +
theme_minimal() +
labs(x = 'Year', y = 'Count', title = 'Accidents per Year in Turkey', subtitle = '2002-2022') +
theme(axis.text.x = element_text(size = 12, face = 'bold'),
axis.title.x = element_text(size = 13, face = 'bold'),
axis.text.y = element_text(size = 12, face = 'bold'),
axis.title.y = element_text(size = 13, face = 'bold'),
legend.position = 'right',
title = element_text(size = 14, face = 'bold'),
plot.subtitle = element_text(size = 13, face = 'italic'),
plot.background = element_rect(fill = '#F4F4F4'),
panel.background = element_rect(fill = '#F4F4F4'),
strip.background = element_rect(fill = '#F4F4F4')) +
scale_y_continuous(labels=function(x) format(x, big.mark = ".", scientific = FALSE)) +
coord_flip() +
scale_fill_manual(values =c('#ea7286','#a9c484'))The plot above shows the Yearly Deaths and Injuries and Material Loss only Number in Turkey. Between 2002 and 2012 total number of accident count increasing each year. In 2012, Accident involving material loss only reach the peak point. Also, in 2022 Accident involving death and personal injury reach the peak point. The frequencies of deaths and injuries for every age group can be seen in the table below on a yearly manner.
data_accidents_full %>%
select(Year,Age_Group,Killed_Or_Injured,value) %>%
pivot_wider(names_from = c("Killed_Or_Injured", "Age_Group"),values_from = "value") %>%
rename_with(~ gsub("Killed_", "Killed Age: ", .), starts_with("Killed_")) %>%
rename_with(~ gsub("Injured_", "Injured Age: ", .), starts_with("Injured_")) %>%
datatable(class = "compact",
caption = 'Yearly Killed or Injured Persons by Age Groups\nin Turkey',
options = list(pageLength = nrow(.)))We also decided to go into detail about the percentage of deaths and injuries in accidents involving death/personal injury. To do so, it is enough to use mutate function to create killed and injured percentages before gathering the data for ggplot.
data_accidents_full %>%
group_by(Year, Killed_Or_Injured) %>%
mutate(Year = as.factor(Year)) %>%
summarise(total_count = sum(value,na.rm = T)) %>%
spread(Killed_Or_Injured, total_count, fill = 0) %>%
mutate(Total = Killed + Injured,
Killed_percentage = (Killed / Total) * 100,
Injured_percentage = (Injured / Total) * 100) %>%
select(Year, Killed_percentage, Injured_percentage) %>%
gather('Type','Percentage',-Year) %>%
ggplot(.,aes(x = Year,y = `Percentage`, fill = Type)) +
geom_bar(stat = 'identity') +
geom_text(aes(label = paste0('% ', round(Percentage,2))),fontface = 'bold',size = 5,hjust = 0.3) +
theme_minimal() +
labs(x = 'Year', y = 'Percentage', title = 'Death/Injury Percentages per Year in Turkey', subtitle = '2002-2022') +
theme(axis.text.x = element_text(size = 12, face = 'bold'),
axis.title.x = element_text(size = 13, face = 'bold'),
axis.text.y = element_text(size = 12, face = 'bold'),
axis.title.y = element_text(size = 13, face = 'bold'),
legend.position = 'right',
title = element_text(size = 14, face = 'bold'),
plot.subtitle = element_text(size = 13, face = 'italic'),
plot.background = element_rect(fill = '#F4F4F4'),
panel.background = element_rect(fill = '#F4F4F4'),
strip.background = element_rect(fill = '#F4F4F4')) +
scale_y_continuous(labels=function(x) format(x, big.mark = ".", scientific = FALSE)) +
coord_flip() +
scale_fill_manual(values = c('#eab281','#ea7286'))In this chart, we showed the deaths and injuries as percentages by year. Each year, between one and three percent of people involved in traffic accidents result in death. To detail our findings further, we decided to create line-graphs that investigate trends on deaths/injuries per age group, as well as overall deaths/injuries yearly. Here, we filtered the 25-64 age group to create a separate plot from the others as the magnitude of their statistics vastly out-range other groups. Then, using the grid_arrange function from the gridExtra library, we merge the plots into a single output.
pa1 = data_accidents_full %>%
select(Age_Group,Year,value) %>%
unique() %>%
filter(Year %in% c(2002:2020)) %>%
filter(!Age_Group == '25 _ 64') %>%
mutate(Year = as.factor(Year)) %>%
group_by(Age_Group,Year) %>%
summarise(value = sum(value,na.rm = T)) %>%
mutate(label = if_else(Year == 2020, as.character(Age_Group), NA_character_)) %>%
ggplot(.,aes(x = Year, y = value, group = Age_Group, color = Age_Group)) +
geom_line(size = 1) +
geom_point(size = 2.2) +
theme_minimal() +
geom_label_repel(aes(label = label),
nudge_x = 1,
na.rm = TRUE,
fontface = 'bold') +
scale_color_manual(values = c('#eab281','#e3e19f','#a9c484','#5d937b','#58525a','#a07ca7','#f4a4bf'))+
scale_y_continuous(labels=function(x) format(x, big.mark = ".", scientific = FALSE)) +
labs(x = 'Year', y = 'Persons', title = 'Persons Killed/Injured in Accidents', subtitle = '2002-2019, All Age Groups\nNot Including 25-64') +
theme(axis.text.x = element_text(size = 12, face = 'bold'),
axis.title.x = element_text(size = 13, face = 'bold'),
axis.text.y = element_text(size = 12, face = 'bold'),
axis.title.y = element_text(size = 13, face = 'bold'),
legend.position = 'none',
title = element_text(size = 14, face = 'bold'),
plot.subtitle = element_text(size = 13, face = 'italic'),
plot.background = element_rect(fill = '#F4F4F4'),
panel.background = element_rect(fill = '#F4F4F4'),
strip.background = element_rect(fill = '#F4F4F4'))
pa2 = data_accidents_full %>%
select(Age_Group,Year,value) %>%
unique() %>%
filter(Year %in% c(2002:2020)) %>%
filter(Age_Group == '25 _ 64') %>%
mutate(Year = as.factor(Year)) %>%
group_by(Age_Group,Year) %>%
summarise(value = sum(value,na.rm = T)) %>%
ggplot(.,aes(x = Year, y = value, group = 1, color = '#eab281')) +
geom_line(size = 1) +
geom_point(size = 2.2) +
theme_minimal() +
scale_color_manual(values = c('#ea7286')) +
scale_y_continuous(labels=function(x) format(x, big.mark = ".", scientific = FALSE)) +
labs(x = 'Year', y = 'Persons', title = 'Persons Killed/Injured in Accidents', subtitle = '2002-2019, Ages 25-64') +
theme(axis.text.x = element_text(size = 12, face = 'bold'),
axis.title.x = element_text(size = 13, face = 'bold'),
axis.text.y = element_text(size = 12, face = 'bold'),
axis.title.y = element_text(size = 13, face = 'bold'),
legend.position = 'none',
title = element_text(size = 14, face = 'bold'),
plot.subtitle = element_text(size = 13, face = 'italic'),
plot.background = element_rect(fill = '#F4F4F4'),
panel.background = element_rect(fill = '#F4F4F4'),
strip.background = element_rect(fill = '#F4F4F4'))
grid.arrange(pa1,pa2,ncol = 1)As you can see, each age group have similar trend except 65+ age group. In the first graph, 21-24 age group has the highest number of death or injury and the lowest group 65+ age group.
Similarly, in two line plots, death and injuries in Turkey are drawn to visualize the trend for the country overall.
p1 = data_accidents_full %>%
filter(Year %in% c(2002:2020)) %>%
mutate(Year = as.factor(Year)) %>%
filter(Killed_Or_Injured == 'Killed') %>%
select(Year,`Accidents involving death and personal injury`,Killed_Or_Injured,value) %>%
group_by(Year,Killed_Or_Injured) %>%
summarise(value = sum(value)) %>%
ggplot(.,aes(x = Year, y = value, group = Killed_Or_Injured, color = Killed_Or_Injured)) +
geom_line(size = 1) +
geom_point(size = 2.2) +
geom_text(aes(label = format(after_stat(y), big.mark = ".", scientific = FALSE), fontface = 'bold'),
stat = 'summary', fun = sum, vjust = 0.5,hjust = -0.4, color = 'black') +
scale_y_continuous(labels=function(x) format(x, big.mark = ".", scientific = FALSE)) +
theme_minimal() +
labs(x = 'Year', y = 'Persons', title = 'Persons Killed in Accidents', subtitle = '2002-2019') +
theme(axis.text.x = element_text(size = 12, face = 'bold'),
axis.title.x = element_text(size = 13, face = 'bold'),
axis.text.y = element_text(size = 12, face = 'bold'),
axis.title.y = element_text(size = 13, face = 'bold'),
legend.position = 'none',
title = element_text(size = 14, face = 'bold'),
plot.subtitle = element_text(size = 13, face = 'italic'),
plot.background = element_rect(fill = '#F4F4F4'),
panel.background = element_rect(fill = '#F4F4F4'),
strip.background = element_rect(fill = '#F4F4F4'))
p2 = data_accidents_full %>%
filter(Year %in% c(2002:2020)) %>%
mutate(Year = as.factor(Year)) %>%
filter(Killed_Or_Injured == 'Injured') %>%
select(Year,`Accidents involving death and personal injury`,Killed_Or_Injured,value) %>%
group_by(Year,Killed_Or_Injured) %>%
summarise(value = sum(value)) %>%
ggplot(.,aes(x = Year, y = value, group = Killed_Or_Injured, color = Killed_Or_Injured)) +
geom_line(size = 1) +
geom_point(size = 2.2) +
geom_text(aes(label = format(after_stat(y), big.mark = ".", scientific = FALSE), fontface = 'bold'),
stat = 'summary', fun = sum, vjust = 0.5,hjust = -0.4, color = 'black') +
theme_minimal() +
scale_y_continuous(labels=function(x) format(x, big.mark = ".", scientific = FALSE)) +
labs(x = 'Year', y = 'Persons', title = 'Persons Injured in Accidents', subtitle = '2002-2019') +
theme(axis.text.x = element_text(size = 12, face = 'bold'),
axis.title.x = element_text(size = 13, face = 'bold'),
axis.text.y = element_text(size = 12, face = 'bold'),
axis.title.y = element_text(size = 13, face = 'bold'),
legend.position = 'none',
title = element_text(size = 14, face = 'bold'),
plot.subtitle = element_text(size = 13, face = 'italic'),
plot.background = element_rect(fill = '#F4F4F4'),
panel.background = element_rect(fill = '#F4F4F4'),
strip.background = element_rect(fill = '#F4F4F4'))
grid.arrange(p1,p2,ncol = 1)- Yearly Death Rates of Countries Comparison
In this graph, we see separately the number of people death or injured in traffic accidents. We see no stable trend in both graphs. The number of deaths doubled in 2015 compared to the previous year. The reason for this is explained in the text parts of the data that we deleted.
Until year 2015, figures on persons killed include the deaths only at the accident area however since year 2015 figures on persons killed also include the deaths within 30 days after the traffic accidents due to related accident and its impacts for people injured and sent to health facilities. Lastly, using the data we obtained from the WHO, we will observe how Turkey’s death rate on accidents fares against selected countries. First, we will observe yearly death rates per 100.000 people in Turkey.
who_data_filtered %>%
filter(Location=='Türkiye') %>%
ggplot(.,aes(x = Period,y = `Value`, fill = '#ea7286')) +
geom_bar(stat = 'identity') +
geom_text(aes(label = Value),fontface = 'bold',size = 5,hjust = -0.3) +
theme_minimal() +
labs(x = 'Year', y = 'Death Rate per 100.000 People', title = 'Road Traffic Death Rates per 100.000 People in Turkey', subtitle = 'Estimated, 2000-2019') +
theme(axis.text.x = element_text(size = 12, face = 'bold'),
axis.title.x = element_text(size = 13, face = 'bold'),
axis.text.y = element_text(size = 12, face = 'bold'),
axis.title.y = element_text(size = 13, face = 'bold'),
legend.position = 'none',
title = element_text(size = 14, face = 'bold'),
plot.subtitle = element_text(size = 13, face = 'italic'),
plot.background = element_rect(fill = '#F4F4F4'),
panel.background = element_rect(fill = '#F4F4F4'),
strip.background = element_rect(fill = '#F4F4F4')) +
coord_flip() +
scale_fill_manual(values = c('#eab281'))In the first plot, we can see that while the rates were mostly declining to between 6-7 since 2000, the death rate in Turkey has peaked in 2011, almost doubling the year before. Afterwards, it took a steady decline until 2019, where it seems to have returned to pre-2011 death rates.
Secondly, we can see how Turkey fares against G7 countries in road traffic death rates
who_data_filtered %>%
filter(Location=='Türkiye' | Location=='Canada' | Location=='France' | Location=='Germany' | Location =='Italy' | Location == 'Japan'
| Location == 'United Kingdom of Great Britain and Northern Ireland'
| Location == 'United States of America') %>%
mutate(label = if_else(Period == '2019', as.character(Location), NA_character_)) %>%
ggplot(.,aes(x = Period, y = Value, group = Location, color = Location)) +
geom_line(size = 1.2) +
geom_point(size = 2.2) +
theme_minimal() +
geom_label_repel(aes(label = label),
nudge_x = 1,
na.rm = TRUE,
fontface = 'bold') +
scale_color_manual(values = c('#ea7286','#eab281','#e3e19f','#a9c484','#5d937b','#58525a','#a07ca7','#f4a4bf'))+
scale_y_continuous(labels=function(x) format(x, big.mark = ".", scientific = FALSE)) +
labs(x = 'Year', y = 'Death Rate per 100.000 People', title = 'Death Rates per 100.000 People', subtitle = '2000-2019, Turkey & G7 Countries') +
theme(axis.text.x = element_text(size = 12, face = 'bold'),
axis.title.x = element_text(size = 13, face = 'bold'),
axis.text.y = element_text(size = 12, face = 'bold'),
axis.title.y = element_text(size = 13, face = 'bold'),
legend.position = 'none',
title = element_text(size = 14, face = 'bold'),
plot.subtitle = element_text(size = 13, face = 'italic'),
plot.background = element_rect(fill = '#F4F4F4'),
panel.background = element_rect(fill = '#F4F4F4'),
strip.background = element_rect(fill = '#F4F4F4'))When we compare Turkey with G7 countries, we can see that while Turkey had similar death rates with European countries and Japan until 2011, the sharp spike in 2011 separated us from the remaining countries, almost reaching the level USA. It can also be seen that the USA always had higher death rates compared to European G7 countries, Turkey, and Japan. To see the trend in Turkey more clearly, we can also highlight Turkey from the rest of the countries.
who_data_filtered %>%
filter(Location=='Türkiye' | Location=='Canada' | Location=='France' | Location=='Germany' | Location =='Italy' | Location == 'Japan'
| Location == 'United Kingdom of Great Britain and Northern Ireland'
| Location == 'United States of America') %>%
mutate(label = if_else(Period == '2019', as.character(Location), NA_character_)) %>%
ggplot(.,aes(x = Period, y = Value, group = Location, color = Location)) +
geom_line(size = 1.2) +
geom_point(size = 2.2) +
theme_minimal() +
geom_label_repel(aes(label = label),
nudge_x = 1,
na.rm = TRUE,
fontface = 'bold') +
scale_color_manual(values = c('grey','grey','grey','grey','grey','darkred','grey','grey'))+
scale_y_continuous(labels=function(x) format(x, big.mark = ".", scientific = FALSE)) +
labs(x = 'Year', y = 'Death Rate per 100.000 People', title = 'Death Rates per 100.000 People', subtitle = '2000-2019, Turkey & G7 Countries') +
theme(axis.text.x = element_text(size = 12, face = 'bold'),
axis.title.x = element_text(size = 13, face = 'bold'),
axis.text.y = element_text(size = 12, face = 'bold'),
axis.title.y = element_text(size = 13, face = 'bold'),
legend.position = 'none',
title = element_text(size = 14, face = 'bold'),
plot.subtitle = element_text(size = 13, face = 'italic'),
plot.background = element_rect(fill = '#F4F4F4'),
panel.background = element_rect(fill = '#F4F4F4'),
strip.background = element_rect(fill = '#F4F4F4'))We will also look for the 20-year averages for those countries, as well as the country with the highest and lowest average. First, we will filter the data and order the countries in a decreasing manner.
who_data_filtered_plot = who_data_filtered %>%
group_by(Location) %>%
summarise(Value = mean(Value)) %>%
filter(Location=='Türkiye' | Location=='Canada' | Location=='France' | Location=='Germany' | Location =='Italy' | Location == 'Japan'
| Location == 'United Kingdom of Great Britain and Northern Ireland'
| Location == 'United States of America' | Value == min(Value) | Value == max(Value))
who_data_filtered_plot$Location = as.factor(who_data_filtered_plot$Location)
who_data_filtered_plot[order(who_data_filtered_plot$Value,decreasing = T),]# A tibble: 10 × 2
Location Value
<fct> <dbl>
1 Thailand 36.8
2 United States of America 13.6
3 Italy 8.58
4 Türkiye 8.11
5 France 7.67
6 Canada 7.49
7 Japan 6.74
8 Germany 5.90
9 United Kingdom of Great Britain and Northern Ireland 4.75
10 Maldives 2.16
For extra safety, we will use the pre-defined reorderFactors function to order their levels, allowing us to use the data in ggplot as is.
who_data_filtered_plot = reorderFactors(who_data_filtered_plot,'Location',c('Maldives','United Kingdom of Great Britain and Northern Ireland','Germany',
'Japan','Canada','France','Türkiye','Italy','United States of America','Thailand'))Then, we can create the plot and highlight the countries we want using ggplot.
who_data_filtered_plot %>%
ggplot(.,aes(x = Location,y = `Value`, fill = Location)) +
geom_bar(stat = 'identity') +
geom_text(aes(label = Value),fontface = 'bold',size = 5,hjust = -0.3) +
theme_minimal() +
labs(x = 'Year', y = 'Death Rate per 100.000 People', title = 'Death Rates per 100.000 People in Turkey', subtitle = '2000-2019') +
scale_y_continuous(limits = c(0,45)) +
theme(axis.text.x = element_text(size = 12, face = 'bold'),
axis.title.x = element_text(size = 13, face = 'bold'),
axis.text.y = element_text(size = 12, face = 'bold'),
axis.title.y = element_text(size = 13, face = 'bold'),
legend.position = 'none',
title = element_text(size = 14, face = 'bold'),
plot.subtitle = element_text(size = 13, face = 'italic'),
plot.background = element_rect(fill = '#F4F4F4'),
panel.background = element_rect(fill = '#F4F4F4'),
strip.background = element_rect(fill = '#F4F4F4')) +
coord_flip() +
scale_fill_manual(values = c('#a9c484','grey','grey','grey','grey','grey','#eab281','grey','grey','darkred'))By looking at the last plot, we can see that on average, road accident death rates of Turkey for the 20 year span has been close to European countries. The country with the lowest death rate, Maldives, has a death rate of almost one fourth of Turkey on average while the country with the highest death rate, Thailand, has a death rate higher than four times of the average of Turkey.
References
Erenler, A. K., & Gumus, B. (2019). Analysis of Road Traffic Accidents in Turkey between 2013 and 2017. Medicina, 55(10), 679. doi:10.3390/medicina55100679
Esiyok, B., Korkusuz, I., Canturk, G., Alkan, H. A., Karaman, A. G., & Hamit Hanci, I. (2005). Road traffic accidents and disability: A cross-section study from Turkey. Disability and Rehabilitation, 27(21), 1333–1338. doi:10.1080/09638280500164867
Kaygisiz, O., Senbil, M., & Yildiz, A. (2017). Influence of urban built environment on traffic accidents: The case of Eskisehir (Turkey). Case Studies on Transport Policy, 5(2), 306–313. doi:10.1016/j.cstp.2017.02.002
Naci, H., & Baker, T. D. (2008). Productivity losses from road traffic deaths in Turkey. International Journal of Injury Control and Safety Promotion, 15(1), 19–24. doi:10.1080/17457300701847648
Ozturk E. A. (2022). Burden of deaths from road traffic injuries in children aged 0-14 years in Turkey. Eastern Mediterranean health journal = La revue de sante de la Mediterranee orientale = al-Majallah al-sihhiyah li-sharq al-mutawassit, 28(4), 272–280. https://doi.org/10.26719/emhj.22.013
Puvanachandra, P., Hoe, C., Ozkan, T., & Lajunen, T. (2012). Burden of Road Traffic Injuries in Turkey. Traffic Injury Prevention, 13(sup1), 64–75. doi:10.1080/15389588.2011.633135
Sungur, I., Akdur, R., & Piyal, B. (2014). Analysis of Traffic Accidents in Turkey. Ankara Medical Journal, 14(3). https://doi.org/10.17098/amj.65427